Visualizing Trees of a Random Forest Model CPSC 547 Project Proposal
نویسنده
چکیده
It is usually not difficult to visualize an entire classification tree in a single image without overwhelming the user with visual clutter. Interpretation becomes a problem when we are required to visualize thousands of trees at once. Random forests have become a very common data mining algorithm due to advantages of high accuracy and ability to handle large amounts of input variables. A random forest generates thousands of classification trees through bootstrap and randomized subset feature selection, and makes a prediction from the input based on the majority vote of all the trees. Interpretation becomes very difficult since it is infeasible to analyze every single tree individually. However, a single tree is easy to interpret and contains a lot of useful information. For example, the label of an interior node represents the input variable used in partitioning the feature space into binary groups that optimizes prediction of the same target value. The leaf nodes correspond to the target values with the most occurrences within the particular group. This project will focus on the construction of a visualization system that accommodates the analysis of thousands of trees generated by random forests.
منابع مشابه
Determine the most suitable Allometric equations for Estimating Above-ground Biomass of the Juniperus excelsa
Today, modeling and determination of allometric equations of forest trees, especially Junipers trees, are very important for determination of biological status and carbon storage capacity of forest species. The aim of this study was to determine the most suitable allometric equations for estimating the biomass of leaf, sub branch, main branch, trunk, and biomass of total Juniperus excelsa tr...
متن کاملCPSC 533C 2009F – Project Proposal PerspectiveEye: Seeing the World from Different Perspectives
Domain, Task and Dataset PerspectiveEye focuses on visualizing and understanding correlations between two sets of geospatial data of different countries over time. In particular, I am interested in looking at poverty, crime rate, mortality rate and other similar data related to humanity, which can be obtained through GapMinder [4]. For example, how are poverty and population density related in ...
متن کاملRisks assessment of forest project implementation in spatial density changes of forest under canopy vegetation using artificial neural network modeling approach
Risks assessment of forest project implementation in spatial density changes of forest under canopy vegetation using artificial neural network modeling approach Nowadays, environmental risk assessment has been defined as one of the effective in environmental planning and policy making. Considering the position and structure of vegetation on the forest floor, the main role of forest under ca...
متن کاملModeling MOOC Dropouts
In this project, we model MOOC dropouts using user activity data. We have several rounds of feature engineering and generate features like activity counts, percentage of visited course objects, and session counts to model this problem. We apply logistic regression, support vector machine, gradient boosting decision trees, AdaBoost, and random forest to this classification problem. Our best mode...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014